ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems
نویسندگان
چکیده
This paper introduces a new portable communication library called ARMCI. ARMCI provides one-sided communication capabilities for distributed array libraries and compiler run-time systems. It supports remote memory copy, accumulate, and synchronization operations optimized for non-contiguous data transfers including strided and generalized UNIX I/O vector interfaces. The library has been employed in the Global Arrays shared memory programming toolkit and Adlib, a Parallel Compiler Run-time Consortium run-time system.
منابع مشابه
High Performance Remote Memory Access Communication: The Armci Approach
This paper describes the Aggregate Remote Memory Copy Interface (ARMCI), a portable high performance remote memory access communication interface, developed originally under the U.S. Department of Energy (DOE) Advanced Computational Testing and Simulation Toolkit project and currently used and advanced as a part of the run-time layer of the DOE project, Programming Models for Scalable Parallel ...
متن کاملInterfacing Global Arrays and ARMCI with the PCRC library, Adlib
This document reports work undertaken at NPAC, Syracuse under the DOE Global Array Extension Project. This work was intended to investigate the feasibility of interfacing, and perhaps eventually integrating , GA, ARMCI and the Parallel Compiler Runtime Consortium library, Adlib. In particular, we have reimplented parts of the Adlib library in terms of ARMCI, and also produced a version of GA wh...
متن کاملA profiler and compiler for the Wonka Virtual Machine
D at ACUNIA , Wonka is an open source Virtual Machine (VM) for executing Java bytecode methods. Wonka is a self-contained, clean-room VM implementation designed for use in embedded systems and is targeted towards Java 2 compliance: it has no dependence on external libraries (except for libc) and comes with all the essential standard class libraries (Foundation Profile/CDC), including an Abstrac...
متن کاملA Portable Run - Time Systemfor
This paper describes a parallel run-time system (RTS) that is used as part of the pC++ parallel programming language. The RTS has been implemented on a variety of scalable, MPP computers system diiers from other data-parallel RTS implementations; it is designed to support the operations from object-parallel programming that require remote member function execution and load and store operations ...
متن کاملA scalable replay-based infrastructure for the performance analysis of one-sided communication
Partitioned global address space (PGAS) languages combine the convenient abstraction of shared memory with the notion of affinity, extending multi-threaded programming to large-scale systems with physically distributed memory. However, in spite of their obvious advantages, PGAS languages still lack appropriate tool support for performance analysis, one of the reasons why their adoption is still...
متن کامل